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Abstract 


Multimodal images carry available information that can be complementary, redundant information, and 
overcomes the various problems attached to the unimodal classification task, by modeling and combining 
these information together. Although, this classification gives acceptable classification results, it still does not 
reach the level of the visual perception model that has a great ability to classify easily observed scene thanks 
to the powerful mechanism of the human brain. 


In order to improve the classification task in multimodal image area, we propose a methodology based on 
Dezert-Smarandacheformalism (DSmT), allowing fusing the combined spectral and dense SURF features 
extracted from each modality and pre-classified bythe SVM classifier. Then we integrate the visual perception 
model in the fusion process. 


To prove the efficiency of the use of salient features in a fusion process with DSmT,the proposed methodology 
is tested and validated on a large datasets extracted from acquisitions on cultural heritage wall paintings. Each 
set implementsfour imaging modalities covering UV, IR, Visible and fluorescence, and the results are 
promising. 


Keywords: Visual saliency model, Data fusion, DSmT formalism, SVM classifier, Dense SURF features, Spectral 
features, Multimodal images, Classification. 


1. Introduction 


Nowadays, multimodal imaging has gained increasing importance in computer vision application, and 
significant efforts have been put into developing methods of different tasks, such as Registration[1][2][3][4], 
Data fusion [5], Representation learning [6], Classification [7]and so on. In classification task, the unimodal 
image presents various problems as noisy data, incomplete information and distorted ones, etc. This often led 
to a misclassification. These limitations are overcome by using multimodal images, which are acquired from 
multiple sensors, and taken for the same object or scene. Each image or modality allows to provide different 
information that can sometimes be redundant, because the same area/scene is presented in a different sensor, 
and complementary for another modality, regarding the diversity of sensor technologies and theirphysical 
interaction mechanism. The use of this set of images together presents a real-world benefit to resolve a given 
problem with some various available information. The fusion of these data form a better quality classification. 


However, these data are crippled with some imperfections such as conflict, ignorance, uncertainty and so on, 
which must be handled and taken into account by dedicated formalism as long as they presentan aspect of 
reality. To fix such problem, several formalism exist as probability theory[8], Fuzzy theory [9], belief function 
formalism [10]and Dezert-Smarandache formalism[11][12].n this work, we benefit from the latest theory 
whichis the most recent one, and it was introduced in order to deal with the high conflicted and 
uncertaintydata thanks to its rich mode lization and the combination operators (PCR5 and PCR6) that it 
integrates. 
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In classification task, belief function theory is widely exploited in many works [13] [14] [15] [16]. Whereas DSmT 
or so-called plausible and paradoxical reasoning shows its efficiency in many applications, it was performed 
for multi-source remote sensing application [17]for supervised classification purpose by integrating contextual 
information obtained from ICM classifier with constraint and temporal information in hybrid DSmT process 
with adaptive decision rule, the authors also proposed a new decision rule based on DSmP transformation for 
change detection purpose [18]. In [19], the authors present an effective use of DSmT for multiclass 
classification by combining two SVM OAA (One-Against-All) implementation using PCR6 combination rule. A 
new method, based on fusing the attribute type information obtained from Ground Moving Target Indicator 
and imagery sensor using DSmT for tracking and classification, has been presented in [20]. Multidate fusion 
has been proposed in [21] [22] for the short-term prediction of the winter land cover. DSmT is also used in the 
medical case retrieval by [23], the authors used DSmT to fuse heterogeneousfeatures of several sensors which 
will be included in CBR systems. 


According to our study of the state of the art, all studied research works disregard the power of perceptual 
attention to well classify any scene thanks to the high human brain capacities. We benefit from this ability in 
our approachby integrating the visual perception model, using DSmT,with spectral and dense SURF features 
obtained from SVM classification for significant classification improvement. 


The paper is organized as follows. After a brief presentation of mathematical background of DSmT formalism 
in section 2, we present the overall system of the proposed method in section 3. Data and experiments are 
then given in section 4 in order to evaluate the performance of our approach on real image datasets. A 
conclusion is given in section5. 


2. Mathematical Background of DSmT 


Dezert-Smarandache theory was proposed jointly by Jean Dezert and Florentin Smarandache[24]Jand was an 
attempt to overcome belief function limitations by handling a high uncertainty and conflicting information. 
This theory can be describedas follows: 


We denote © = {0;, 2,....., Oy} the discernment space of the N class classification problem, and D® the hyper- 
power-set[25] that is the set of subsets of ©, with the union of classes and also their intersection, so that 
if X,Y € D®,thenXUY € Dand XNY € D®. Each source S; contributes its belief mass m; toX, known by the 
generalized basic belief assignment gbba step and satisfying following properties: 


mi(X): D? > [0,1] (1) 
m;(@) = 0 (2) 


Where @ is the null set, 


> m;(X) =1 (3) 


The size of hyper-power-set presents a real limit in DSmT when N>6 (N number of classes) in Free model[26] 
which corresponds to the full hyper-power-set without any constraints, in contrary to hybrid model[26]which 
allows integrating constraints that can be exclusive and refined, and therefore minimizing D® size. 


The assigned generalist mass obtained from different sources are then combined and a new mass distribution 
is provided to D°elements. Combination step presents the kernel of the fusion process and each formalism 
proposed several combination operators.In DSmT formalism, all combination operators can be found in detail 
in [27], we quote the most used as Smets rule, Dempster Shafer (normalized) operator, Yager operator, Zhang 
operator, DsmH rule, Debois and Prade rule, PCR5 operator for N=2 and PCR operator for N>2. To deal with a 
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large number of the sources used in this work and the high uncertainty and conflicting information provided, 
we benefit from the performance of PCR6 combination rule in handling such problem. 


The generalized belief functions Credibility noted Bel(.) orCr(.),Plausibility noted PI(.)and DSmP 
transformation are derived from the function of basic mass and respectively defined for D® in [0,1] : 





Cr(X) = >: m;(x) (4) 
xe DO 
xex 
PI(X) = > m;(x) (5) 
xe DO 
xNX#O 
DSmP.(@) = 0 
Yee (Z)+e.C(XnY) 
DSmP.(X) = RO x MO) V XEG {D} ©) 
mP, (X) = Xy e c0 E zcy m(Z)+e.C (Y) 


c(Z)=1 


G® can present full D? or reducedD® with constraint , depend on the model used ( Free or Hybrid).€ is an 
adjustment parameter, C(X NY) and C(Y) are respectively the cardinality of X N YandY. 


The last step in DSmT process is making a final decision, which presents a real challenge in many applications. 
In this work, we are interested in improving classification, we have to take a decision about pixels’ belonging 
to a simple class also called Singleton class, and in this case there are two ways: taking decision based on 
maximum of generalized basic belief mass gbba or based on generalized belief function already computed as 
follows: 


° Maximum of credibilityCr(.)is widely used in many applications[28], and it is considered as a 
pessimistic decision. 


° Maximum of plausibility PI(.) which is considered as an optimistic decision. 


° Maximum of DSmP that is a compromise decision between the above decisions which are based on 
using probabilistic transformation P(.) in the interval of [Cr(. ), PI(.) ]. 


3. Overall System 
3.1. Pre-processing 


Generally, the pre-processing that precedes classification aims to eliminate imperfections that taint 
information by a set of actions as filtering, gradient operations, etc. However, in the classification based on the 
theories of the uncertain, these imperfections are protected, modeled and combined to help to make a 
decision. 


The registration is the usually used pre-processing in the fusion process, it aims at setting correspondence 
between two or more images of a scene obtained from one or various sensors potentially at different spatial 
positions and scales, by using an optimal spatial and radiometric transformations between the images. 


In the case of multimodal images, registration is an issue because of the significant difference between images 
[29][30]. An original methodology was proposed in a previous work to answer the particular issue of the 
registration with multimodal imaging inputs in whichwe exploit the SURF scale- and rotation-invariant 
descriptors for the identification and the description of the interest points and we introduce a relevance 
filtering based on both SURF distance and orientation featuresin matching step[1]. 
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3.2. Feature Extraction 


Feature extraction is a pivotal step in the classification process. It aims to underline the relevant features that 
are correspondentto various classes. It is worth stating that the appropriate choice of extracted features 
improves the performance of classification step. Spectral, Spatial and perceptual features are extracted in this 
work. 


3.2.1. Spectral Information 


The spectral information is widely used on large classification methods. In this work, we have extracted the 
spectral values of each pixel as a vector of attributes and then converted them to Cielab space model for a 
better correlation with human color processing. 


3.2.2. Dense SURF Description 


Speeded up robust feature (SURF) proposed by Herbert Bay [31] is a spatial descriptor which consists 
originally of two phases, Detection and description of keypoints. We proposed in a previous work [32]to skip 
the detection phase and to perform description one to each pixel in the image. This is done, at the first by 
assigning to each pixel the dominant orientation calculated by combining the Haar wavelets results within a 
circular neighborhood around each pixel, and then creating 4 x 4sub regions around the pixel. In each sub- 
region, a pixel wise Haar wavelets responses are computed, which in turn are summed up to form 64-elements 
descriptor. 


3.2.3. Saliency Information 


Based on a performed comparative analysis of saliency detection in our multimodal data [33], we extract the 
saliency features by using the method proposed by Rahtu et al [34]. This method used local features contrast 
in illuminance, color mapped to feature space F(x) that is divided into disjoint bins. A saliency measure is 
calculated by applying a sliding windows w divided into inner windows K and border B in which a hypothesis 
that points inKare salient and points in B are not, the measure can be defined as probability conditional and 
computed through the Bayes Formula as 





hg (x)po 
= 7 
AO Teea M 
With 0 < po < landhg(x) = P(F(x)|H1). A regularized saliency measure is then introduced to make it more 
robust to the noise. 


The motivation of integrating saliency information in the fusion process is the fact that usually visual 
perception succeeds easily to classify any objet or scene. 


3.3. SVM Pre-Classification 


Support vector machine is a supervised classification method introduced by Vapnik [35][36], widely used in 
classification applications thanks to its performance to deal with high-dimensional data. Basically, it is 
designed for binary class by finding an optimal hyperplan that separates the two classes linearly-separated. In 
non-linear separable class, the feature space is mapped to some higher dimensional feature space where the 
classes are separable using a Kernel function K that should fulfill Mercers conditions, the most kernels used 
are Radial Basis Function RBF, in which the decision function is expressed as a flow 


h(x) = Sign avexp{— |x — x{|?/07}) (8) 
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Where a; are Lagrange multipliers, and the associated Kernel function is 


lxx 112 


k(x,x)=e 27 (9) 


In case of multiclass problem, two main approaches were proposed, One-Versus-Rest approach in which k 
binary classifiers SVM,are constructed for k-class classification, and One-versus-Onein which 2 binary 
classifiers are applied on each pair of classes. 


In order to generate the probabilities for DSmT, we have performed a pre-classification[32]based on 
combining spectral information (cited in 3.2.1) and Dense SURF information ( cited in 3.2.2) using SVM 
classifier with RBF kernel to handle non-linear high-dimensional data in our multimodal dataset, and One- 
Versus-Rest approach to deal with incomplete information provided from divers modalities. 


34. DSmT Classification 
3.4.1. Mass function estimation 


Mass estimation function step is very crucial in fusion process, because the imperfections such as uncertainty, 
imprecision, paradox will be introduced. The most generation used fortheses masses is the probabilities from 
pre-classification. The SVM classification of k images generates the matrices of the probabilities P(x;|6;)of 
pixels belonging to the singleton class of the frame of discernment © = {6,,62,.....,Onn}, the same for k 
saliency map generated using the proposed method in [34]. Each source (modality/saliency map) noted 
SP(i = 1,....,K) gives the probability of belonging to one, or two classes, and their complementary classes 
which presents the mass of the partial ignorance. Based on [19], we denoteO = {6, .....6,,}, and the gbba mass 
of each source is given by: 


ms(6,) =D vo,ee, 
P(x|Uo<j<n 8j) 
ms) = — = — vg e9, 
ms(@) = 0 


(10) 


Where z = 7-9 p(x|6;) is a normalization term that we used in order to make sure that X m = 1. 
3.4.2. Combination of masses and decision 


The estimated masses must be combined with appropriate rules that handles the conflict generated from 
different sources S}. In this work, we have used PCR6 [37] rule in combination step because it shows a better 
performance compared with all combination rule cited in the previous section and tested o our datasets. The 
PCR6 is computed as follows: 


Considering N independent sources, the combined mpcr,() masses acquired fromN > 2sources are 
computed as follow: 


Mpcr,(O) = 0,V Xe D°{9}, 





= S X m4(X1)m2(X2)...ms (Xs) 
Mpcre (À) = Mi2..5 (À) + Era se DOW) Dr=1 ôg, M, (X;)]. alk PET ayo KD (11) 
1X2 ...X 5 = 


Where 


Lif X =X, 


X a 
bx, g a if X EX, (12) 
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Where the massm:>. $(X) = N m (X) corresponds to the conjunctive consensus on X between N > 2 sources. 


Once the combination step is achieved, we calculate the generalized belief function and we use a probabilistic 
transformation DSmP that converts the combined masses measure to a probability measure using Eq (6) to 
make a final decision. 


4. Data and Experiments 
4.1. Data 


Large sets of multimodal images acquired on wall paintings from the Germolles palace are used to 
demonstrate our proposed method. This palace was offered by Dukes of Burgundy Philip to his wife Margaret 
Flanders in 1380, and it was the only remaining castle of the Dukes of Burgundy so well preserved, its wall 
painting was restored between 1989 and 1991. However, there were no conservation reports of the applied 
restoration. In order to detect the original from restored area, the conservator of Germolles used the 
multimodal images that have the advantage of being fast and relatively inexpensive solution for the 
examination of large areas of wall paintings.This technical photography consists of recording a set of images 
with a commercial digital photographic camera which has been modified by removing the thermal filter 
regularly positioned in front of the CCD. In this way it is possible to record images of reflected visible light 
(Vis), reflected infrared light (IRr), reflected ultraviolet light (UVr) and UV-fluorescence (UVf). This set of images 
provides information about the optical behaviour of the surface when reached by the different types of light 
and therefore provides information about the original portions of wall paintings from recent repainting. 


For illustration purpose, we select an area of a south wall of the dressing room of Margaret represented in 
Figure 1. This area presents a large white P (for Philip) that covers the walls and painted in green, which is 
presented by four modalities VIS, UVF, UVR and IRR. Each modality measures 3744x5616 pixels. IRR modality 
shows very well the parts over non-original green surface. The image of the UV-induced fluorescence modality 
shows a relatively strong fluorescence corresponding to remains of an old/original paint layer over the white. 
The UVR image helps to identify the repainting original over the white of the letter P. 


UVF 





Figure 1 Multi-modal images of the same area 
4.2. Experiments 


The adopted methodology can be divided into four steps as illustrated in figure 2, which is started 
with the preprocessing by aligning each image with the VIS image that is used as a reference image. 
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Registration 





Feature Extraction 
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SVM - Classification 
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Spectral information Saliency information 








Mass Function Estimation 


RCR6 Combination Rule 


DsmP Decision Rule 
l 


Final Image Classified 


Figure 2 A representative illustration of the workflow 


In the second step, four topics have been identified: White original (WO), White repainted (WR), Green original 
(GO) and Green repainted (GR). Then spectral and Dense-SURF information is extracted and used jointly as the 
entry ofthe SVM classifier using the RBF kernel. In parallel, Saliency information is extracted using the 
proposed method in [34], the provided maps are shown in figure3. 


VIS UVF UVR IRR 





Figure 3 Saliency maps 


The third step is pre-classification using the SVM classifier that is applied to the images, in order to recover the 
probability matrixes of pixels belonging to classes. Each used modality highlights the presence of one or two 
classes. The UV-induced fluorescence modality shows a relatively strongfluorescence corresponding to the 
remains of an old painted layer of the white (WO) that reaches an accuracy of 92% using SVM, also UVR 
modality emphasizes WO class with a classification accuracy of 98%. Infrared light shows very well the parts 
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over the original and repainted surface of the green and gets accuracy of 94%[32]. The provided maps are 
presented in figure4. 


UVF 





wo WR Go GR 


Figure 4 Multimodal SVM Classification 


The VIS modality reaches an accuracy of 98% with the classification of the two classes GO and GR, whereas 
this precision is reduced when classifying four classes because of the increase of the conflict. The classified 
image is presented in figure 5. 


The last step presents the fusion process that is started with defining the frame of discernmentO = 
{WO,WR, GO, WR}. Due to the obtained information by SVM classification and saliency maps, there are some 
constraints that can be taken into account to deal with the real situation and to reduce the hyper power 
setD ©, for example WO NGR = 6. 


Then the mass function that is associated with the emphasized class and it's complementary in each modality 
are computed using equation 10. The PCR6 combination rule is used for combining the calculated masses 
basing on the equation 11, and as a final task, the decision is taken using maximum DsmP. 


The final classified map, provided by DSmT only, is given in figure 6, and the final classified map obtained 
using DSmT-Salience is shown in figure 7. 


Legend 
Evo 
EE we 


Misr 





Figure 5 SVM classification of VIS modality 
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Figure 6 DSmT classification of multimodal images 


The results have progressed with the integration of the perceptual model in DSmT process, the visual analysis 
of the classification maps shows that the result of the proposed method much better with the ground truth 
over the WR and WO classes and appears to be closer to the reality, rather than the result obtained using 
DSmT only for the same classes, while the obtained map using unimodal image present a degraded result in 
terms of smoothness and connectivity between classes. 


Legend 





Figure 7 DSmT- Salience classification of multimodal images 
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In this work, in order to evaluate the performance of the used methods and to compare the results, we have 
used the Overall accuracy (OA) that presentsa percentage of correctly classified pixels, and Mean Error Rate 
(MER) that presents the percentage of misclassified pixels. Table 1 summarizes the obtained results using the 
different methods, from the results, we can note that the proposed method produces a better overall accuracy 
of 95,39% compared with the DSmT classification which provides an overall accuracy of 91,46% and the SVM 
classification that gives an overall accuracy of 86,43%, in terms of the error rate, the proposed method gives 
the low MER score of 4,61% compared with DSmT-Classification and SVM-Classification that provides a MER 
of 8,53% and 12,60% respectively. 


In conclusion, the use of DSmT theory with PCR6 combination rule provides a better result thanks to its 
effectiveness in managing correctly the conflict information that is provided from the different sources, and 
shows a significant classification improvement compared with the unimodal SVM classification. Thus, the 
integration of saliency information inthe fusion process presents a real benefit due to the powerful mechanism 
of the human brain in classification tasks. 














METHODS OA MER 
SVM-Classification 86,430% 12,60% 
DSmT- Classification 91,466% 8,53% 
DSmT-Salience-Classification 95,39% 4,61% 

















Table 1 Accuracy and errors of classification results from different methods 
5. Conclusion 


In this paper, we have proposed a new method for multimodal image classification. As a first step, we have 
extracted spatial (Dense-SURF), spectral and saliency information. The extracted spatial and spectral 
information are combined and passing to the classifier SVM for pre-classification step. The SVM-classification 
results that are obtained from each modality is then fused using DSmT theory, the use of DSmT and SVM 
jointly provides better performance compared with the unimodal SVM classification. In the second step, the 
extracted saliency information is then modeled and combined with SVM classification results using DSmT 
process based on PCR6 combination rule and DsmP decision rule, the proposed method yields the best 
performance in terms of accuracy and error rate compared with DSmT-SVM classification and unimodal SVM 
classification. 
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